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HIGH THROUGH-PUT CLONING OF PROTOONCOGENES 

General Background 

Cancer is the phenotypic manifestation of a complex biological progression 
5 during which cells accumulate multiple somatic mutations, eventually acquiring 
sufficient growth autonomy to metastasize. Although inherited cancer susceptibility 
alleles and epigenetic factors influence the process, carcinogenesis is fundamentally 
driven by somatic cell evolution (i.e., mutation and natural selection of variants with 
progressive loss of growth control). The genes which are the targets of these somatic 

1 0 mutations are classified as either protooncogenes or tumor suppressor genes, 

depending on whether their mutant phenotyes are dominant or recessive, respectively. 

In several animal models, an important source of protooncogene somatic 
mutations is retrovirus infection. Retroviruses cause can cause cancer by essentially 
three mechanisms: (i) transduction of host protooncogenes (which then become viral 

15 oncogenes), (ii) frans-acting effects of viral gene products, or (iii) c/s-acting effects of 
provirus integration on protooncogenes at or very near the site of integration. In the 
later case, only rare infected cells are affected. This phenomenon is called provirus 
insertion mutation, and will be discussed in detail in the following narrative. 

As a normal consequence of the retroviral life-cycle, DNA copies of the 

20 retrovirus genome (called a proviruses) are integrated into the host genome. 

Accordingly, retroviruses are obligate mutagens. A newly-integrated provirus can 
affect gene expression in c/s at or near the integration site by one of two mechanisms. 
Type I insertion mutations up-regulate transcription of proximal genes as a 
consequence of regulatory sequences (enhancers and/or promoters) within the 

25 proviral long terminal repeats (LTRs). These insertion mutations typically affect genes 
that are not expressed in the target tissue. Type II insertion mutations cause 
truncation of coding regions due to either integration directly within an open reading 
frame or integration within an intron upstream of the stop codon. 

Provirus integration is random. Therefor, ail host genes are targets of insertion 

30 mutation. In a chronically-infected tissue, a sufficient number of cells have new 
provirus insertions that, statistically, all genes in the genome are mutated. In rare 
cases, an insertion mutation will "activate" a host protooncogene, providing the 
affected cell with a dominant selective growth advantage in vivo. If the cell progresses 
to cancer, then the protooncogene insertion mutation will be present at clonal 
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stoichiometry in the tumor. Such "clonally-integrated" proviruses serve to "tag 11 the 
locations of protooncogenes in the genome. In cases where the proviral enhancer is 
responsible for dysregulation of the mutated protooncogene, the provirus can be 
1 00 kb or more from the site of integration (but usually much closer). 
5 This relatively tight linkage between clonally-integrated proviruses and 

protooncogenes is the basis for a classical experimental strategy, called "provirus 
tagging," in which slow-transforming retroviruses that act by an insertion mutation 
mechanism are used to isolate protooncogenes. The complete logic is as follows: 
(i) uninfected animals have low cancer rates, (ii) infected animals have high cancer 

10 rates, (iii) the retroviruses involved do not carry transduced host protooncogenes or 
pathogenic trans-acting viral genes, (iv) the cancer incidence must therefor be a direct 
consequence of provirus integration effects on host protooncogenes, (v) since provirus 
integration is random, rare integrants will "activate" host protooncogenes that provide a 
selective growth advantage, and (vi) these rare events result in new proviruses at 

15 clonal stoichiometrics in tumors. 

In contrast to mutations caused by chemicals, radiation, or spontaneous errors, 
protooncogene insertion mutations can be easily located by virtue of the fact that a 
convenient-sized genetic marker of known sequence is present at the site of mutation 
(/.e„ the provirus). Host sequences that flank clonally-integrated proviruses can be 

20 recovered using a variety of molecular techniques. Once these sequences are in 
hand, the tagged protooncogenes can be subsequently identified. 

There are two unequivocal biological criteria that provide prima facie evidence 
that a protooncogene is present at or very near a proviral integration site. The first 
criterion is the presence of provirus at the same locus in two or more independent 

25 tumors. This is because the genome is too large for random integrations to result in 
observable clustering. Any clustering that is detected is indirect evidence for biological 
selection (/.e., the tumor phenotype resulting from activation of a host protooncogene). 
The second criterion is a tumor with only a single insertion mutation. In this case, if 
there is only one insertion mutation, then that provirus is located at a protooncogene 

30 locus: If either of these criteria are met, sufficient evidence exists to reach a 
conclusion that a protooncogene locus has been located. 

The provirus tagging concept has withstood two decades of testing in many 
retrovirus tumor models that have a provirus insertion mutation etiology. The 
biological logic is so compelling, and the experimental results so unequivocal, that the 

35 claim can be made that the activated genes are functionally-validated as 

protooncogenes at the time-of discovery. Formal confirmation typically involves 
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isolation of a full-length cDNA for use in a bioassay (either a cell-based transformation 
assay or transgenic mice). 

Provirus tagging in avian and mammalian systems has led to the identification 
of approximately 50-60 protooncogenes (many of which were new genes not 
5 previously identified by other techniques). The three mammalian retroviruses that 
cause cancer by an insertion mutation mechanism are FeLV (ieukemia/lymphoma in 
cats), MLV (Ieukemia/lymphoma in mice and rats), and MMTV (mammary cancer in 
mice). 

Despite the tremendous promise of the provirus tagging approach, as originally 

1 0 designed it was not well-suited for large scale application. The main problem was that 
it was too laborious and, therefor, the risks of reisolating known genes became 
unacceptable for most investigators. As a consequence, the protooncogene discovery 
potential of this approach has remained largely untapped. 

Recognizing this untapped potential, we designed and implemented HPT to 

1 5 overcome the limitations of the original provirus tagging approach (which were all 

fundamentally related to throughput). We were able to successfully increase provirus 
tagging throughput to the point where reisolation of known loci is no longer a problem. 
In fact, this is now a desirable outcome because it serves as an "internal control" that 
helps validate the biological relevance of the new genes that are recovered in parallel. 

20 As a functional oncogenomics strategy, HPT has many advantages. First, it is 

a functional cloning rather than brute-force (e.g., differential display-based)approach; 
and the genes that are recovered are functionally-validated at the time of discovery. 
Second, it has high biological relevance since protooncogenes are isolated directly 
from clinical material (rather than from cell lines, transplants, or materials generated by 

25 gene transfer). Third, it is amenable to automation, meaning that throughput and time- 
to-discovery is a simple function of research resources. 

The invention is a process called high-throughput provirus tagging (HPT). HPT 
yields partial protooncogene cDNAs from retrovirus-induced tumors. Using these 
partial cDNAs, conventional techniques can be used to recover full-length cDNAs (we 

30 have not yet performed this final step). A conceptual diagram is shown in Appendix A 
and a flow chart of the process is shown in Appendix B. 

HPT is derived from classical procedures for provirus tagging (see Appendix C 
for background information). It Is specific for tumors Induced by retroviruses that 
cause cancer via a provirus insertion mutation mechanism. This subset of retroviruses 

35 includes the mouse mammary tumor virus (MMTV). MMTV-induced tumors were used 
to implement the HPT process. 
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In tumors induced by provirus insertion mutation, new proviral integrants 
present at clonal stoichiometiies tag the locations of host protooncogenes. The 
majority of such integrants fail outside of transcribed regions. However, a subset fall 
within sequences that are transcribed, and result in the formation of chimeric 
5 transcripts containing both host and virus sequences. HPT is designed to recover 
host/virus junction sequences from these chimeric transcripts. 

The strategy used is a modified/optimized anchored-PCR (A-PCR)approach 
incorporating a custom anchor. The procedure amplifies host sequences upstream of 
5* LTRs. If a transcript containing a host/virus junction is present in a tumor, then a 

10 unique fragment is generated by the A-PCR procedure, which can be detected by gel 
electrophoresis. In addition, one or more common fragments will be generated from 
retroviral transcripts that contain the 5' end of the 3' LTR. 

The innovation that makes this approach feasible is that cDNAs are digested 
with a restriction enzyme with a 4 bp recognition sequence prior to amplification. This 

15 generates populations of target cDNAs that (1) have precise 5* ends, and (2) are 
sufficiently small to ensure that they will efficiently amplify. In addition, restriction 
enzymes are selected that produce the largest possible retroviral transcription 
products (so that they run at the top of the gel). This is critical because chimeric 
transcripts are present at much lower levels than the major retroviral transcripts. By 

20 selection of appropriate restriction enzymes, a large detection window is available in a 
region of the gel where the signal-to-noise ratio is most favorable. In addition, during 
amplification, cycling times are ramped to favor smaller products. 

The provirus tagging strategy has been used for almost 20 years. It is a DNA- 
based detection method where identification of new genes requires positional cloning 

25 procedures to find genes adjacent to integration sites recovered from tumor DNA. This 
laborious process has been recently improved by PCR procedures. Nevertheless, 
unless the integration falls within known sequence, it is not possible to identify the 
affected gene without a large amount of additional work. 

The advantage of HPT is that it is the first PCR-based provirus tagging 

30 approach that recovers protooncogenes from RNA. Because RNA is used, new 
protooncogenes are identified directly. Although only a fraction of tumors have 
insertion mutations that generate a chimeric transcript, the process has been designed 
to be high-throughput. As a consequence, the fact that most samples are non- 
informative is not a problem. In addition, the process is so efficient that recovery of 

35 know protooncogenes does not represent an unacceptable loss of effort, and, in fact, 
serves as an internal control to verify the robustness of the strategy. 
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I claim: 

1 . A method of identifying protooncogenes comprising: 

inserting a provirus into the genome of a host forming a junction site of 
DNA of said virus and said host; 
isolating mRNA from said host; 
preparing cDNA from said mRNA; 
amplifying said cDNA to identify the nucleic acid sequence of said 
junction site, whereby said candidate target gene is identified. 
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| APPENDIX A 



FIGURE 1 - CONCEPTUAL DIAGRAM 
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The HPT process can detect insertion mutations in either orientation anywhere in a 
transcribed sequence (including introns). Figure 1 illustrates an integration in the 3' 
untranslated region of a hypothetical protooncogene. This is the most common type of 
insertion mutation detected by HPT in the MMTV system. 
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APPENDIX B 



FIGURE 2 - FLOW CHART 
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FIGURE 3 - EXAMPLE OF HPT SCREENING DATA 
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APPENDIX D 



A. General purpose or utility 

HPT is new technology for isolating partial cDNAs representing functionally validated 
protooncogenes. It is a scaleable batch process that Is amenable to high-throughput 
applications. Saturation mutagenesis of all accessible protooncogenes in the mammalian 
genome is feasible using HPT. 

B. Brief description of the state of the art prior to your invention 

The state-of-the-art prior to this invention was PCR-based isolation of provirus integration 
sites from tumor DNA. This method, as currently practiced, involves an inverse-PCR (l-PCR 
strategy). Identification of the activated protooncogene at a particular integration locus 
relies on prior characterization of the gene by other methods. For novel genes, positional 
cloning is required. The DNA-based method involves considerable risk, since it is not know 
until the end of the positional cloning process whether the locus under investigation is novel. 



c. 


Technical description 


1. 


Isolate total RNA from frozen tumor tissue. 


2. 


Treat with DNase. 


3. 


Prepare double stranded cDNA. 


4. 


Digest with restriction enzyme. 


5. 


Ligate anchor to digested cDNA. 


6. 


PCR amplify targets with LTR and anchor primers. 


7. 


Reamplify targets with nested LTR and anchor primers. 


8. 


Electrophores amplification products. 


9. 


Sample new band, if present 


10. 


Reamplify band. 


11. 


Clone. 


12. 


Determine sequence. 


13. 


Assign CTT number. 


14. 


Perform homology search. 


15. 


If sequence is anonymous, design primers for fingerprinting. 


16. 


Use primers to amplify BAC and YAC superpools. 


17. 


Electrophores to determine banding pattern for the locus (fingerprint). 


18. 


Assemble into linkage groups. 



CTTs can be assembled into linkage groups based on their fingerprints. Using a 
representative CTT from each linkage group, conventional techniques can then be used to 
isolate full-length cDNAs for sequence analysis and deduction of the" amino acid 
sequencing of the protooncogene. 
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Shown are the A-PCR results from an HPT analysis of 48 independent MMTV-induced 
tumors performed as described in the previous sections. Tabulated below the gel are the 
results of a BLAST search using the CTT from each recovered junction fragment (boxes). In 
addition to novel sequences, known targets of MMTV-insertion mutation were recovered. 
Also recovered, were known protooncogenes not previously recognized as targets of 
MMTV and know genes that had not previously been known to have protooncogene 
function. The processing time from frozen tissue to cDNA sequence is five work days. 

E. Possible modifications and variations on the best way 

1. A partial digestion strategy is being implemented to recover more chimeric 
transcript sequence from loci which have CTTs that are too short to BLAST and/or 
contain low complexity or repetitive sequences. This will allow usable sequence to 
be recovered upstream of most CTTs currently listed as "unusable". 

2. The HPT process has been implemented to recover host/virus junction fragments 
from integrants in the same transcriptional orientation as the target gene using 
minus strand primers from the 5' LTR. It is also possible to modify the procedure to 
recover host/virus junctions from integrants in the opposite orientation using plus 
strand primers from the 3 f LTR. 

3. The current procedure generates retroviral transcripts that run high in the gel so that 
novel host/virus junctions are clearly visible. It is also possible to remove, destroy, 
and/or inhibit the formation of retroviral transcripts. 
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4. The process claim can be generalized to include any method that uses a chimeric 
mRNA between a retrovirus and cellular gene to discover a gene of interest based 
on either an in vivo or cell culture bioassay. 

F - Advantag es and improvements over existing practice 

The primary advantage and improvement over the existing state of the art is that the 
affected protooncogene is specifically recovered by the HPT process. Using DNA-based 
approaches, positional cloning is required t&find the protooncogenes at loci that have not 
previously been characterized. 

The following features are believed to be new: 

1 . First cDNA-based application of provirus tagging using PCR methods. 
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| APPENDIX^" 



GLOSSARY 

CTT chimeric transcript tag 

HPT high-throughput protooncogene tagging 

LTR long terminal repeat 

MMTV mouse mammary tumor virus^ 
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| APPENDIX F 



CTTs FROM FIGURE 3 



CTT 
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SIZE 


BLAST RESULTS 
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Wnt3 
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Fgf3 
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.. Fgf8 
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hs2llgf2 
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novel 
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Fgf3 
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260 


Fgf3 
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45 


Myb 
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31 


unusable 
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46 


novel 
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novel 
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unusable 
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Sp100 
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16 


unusable 
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22 


WntZa 


A 


322 


Fgf3 


A 


48 


novel 


A 


158 


Wnt1 
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unusable 



CTT0001 

CTT0002 

CTT0003 

CTT0004 

CTT0005 

CTT0006 

CTT0007 

CTT0008 

CTT0009 

CTT0010 

CTT0011 

CTT0012 

CTT0013 

CTT0014 

CTT0015 

CTT0016 

CTT0017 

CTT0018 

CTT0019 



MM0001 
MMO0O3 
MM0006 
MM0009 
MM001O 
MM0029 
MM0034 
MM0038 
MM0045 
MM0064 
MM0065 
MM0078 
MM0084 
MM0084 
MM0092 
MM0094 
MM0103 
MM0134 
MMQ250 



CTTQQ01 

CATGGCGAGA TTCTGTGTCC AAGCTGCCTC TACTCGTGAC ATTCCAAGAT GCCTCTGAGG 
TGGGAACTGT GAAAT AGGAC AGAGCCCCAC AGTCCCCTCT T 



CTTQ002 

?og5c ACTTT gtctacca ^ gccactccaa gcacccagct GCATACAGGT 



CTT0003 

CATGCTGGCT GTTCCTGCAG CCCAGCTACT GGGACAATCT GGAAAC 



8/13 



WO 02/057497 



PCT/US02/01651 



CTT0004 

CATGTGCTCA ATCCATAG 

CTT0005 

CATGGGTCCC TGAAGGGTCT CTCCTTTAGC AAACCCCTGT ACAGTTGAAG TGATTTTTCA 
GGTACCCATT GGTCTTAGC 

CTT0006 

CATGGCAAGA TGGAGACTTT GTCTACCAGG GCCACTCCAA GCACCCAGCT G 

CTT0007 

CATGCACACA AACTGGC C CT GAACTTTTGA CTTCCAGGCC TCTGCCTCTC TGCGCGCACA 
CACACACTCG CACTCCTGTA TATGAAGCGT ATATGTGTTT CTCTGGGAAC TGTTTTTATC 
AGGTGAAGTA CTTCCTTTGT TCTTGCTACC CACCTCCAGG GCTCCAGGAT CTCCAGACAG 
CCAACCCTAA GACAGGC C CA GCTTCTCTGT ATCTCTGTGA TGAGAACCTT GGCATAGAGC 
TGCCTCACCC TCGGGATAGG 

CTT0Q08 

CATGCCTCTG GAAAGTACCT TAAACATAGA ATCCCCTCCC TAGTG 

CTT0009 

CATGGTTTTT TTTTTTTTGA GTGTGTGTGT G 

CTT0010 

CATGCAGATT AAAGTACATA TATGTAAAAA ATAAAAATAA ATCTTT 

CTTQ011 

CATGATAAGG TTAGAGTTTT GTGAGCCTCC TTAACCTTGC TCAGCAAGCG TTGGGCTCTT 
GGCAGCCGAG CTGCCATCTT TCTCATCCCC GATAGAGCCA GCCGCCCTTG TCGTGTCTTG 
AATAAGTTAG AGGAGGCATT ATAGAGCGGA CCTAAACATT TGCCTTGGAG CCTGAGGGAT 
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GGGGATTGGC 


TGAATGTGAA T 




CTT0012 






CA 






CTT0013 






CATGAATTCA 
TTACACTTGC 
GC 


TCACTGGTAA AATGTATGAA TTTCTTCTGA GACAGAGTCT 
TTCGAGCGGA TGATTCTGCT GCTTCAGCCT CTTGAGATar 


TCTTATTGGC 


CTT0014 






CATGGATGCT 


ATTGGG 




CTT0015 

CATGAGAGGG 


TGCTTCAGGG TG 





CTT0016 



CATGCACACA AACTGGCCCT GAACTTTTGA CTTCCAGGCC TCTGCCTCTC TGCGCGCACA 
CACACACTCG CACT CCTGTA TATGAAGCGT ATATGTGTTT CTCTGGGAAC TGTTTTTATC 
AGGTGAAGTA CTTCCTTTGT TCTTGCTACC CACCTCCAGG GCTCCAGGAT CTCCAGACAG 
CCAACCCTAA GACAGGCCCA GCTTCCTCTG TATCTCTGTG ATGAGAACCT TGGCATAGAG 
CTGCCCTCAC CCTCGGGATA GGGCTTATGT TCCCCGGAAC GAGCCAGGCA CCTCAACAGC 
TCCTGGGGAG GAATAGGGGA CT 



CTT0017 

CATGAATTCC ACACCTCCAT CAAGGGTGTC TTCTCCAGTG AGCCCCGG 



CTT0018 

CATGCCTCCC " TCAGCCTCCT CCCACCCCTT CCTGTCCTGC CTCCTCATCA CTGTGTAAAT 
AATTTGCACC GAAATGTGGC CGCAGAGCCA CGCGTTCGGT TATGTAAATA AAACTATTTA 
TTGTGCTGGG TTCCAGCCTG GGTTGCAGAG ACCACCCT 
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CA 
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| APPENDIX G 

NEW CANDIDATE PROTOONCOGENES 

This group includes all novel CTT sequences s 20 bp. Additional sequences are pending. 



CTT LOCUS 


SAMPLE 


BAND 


SIZE 




CTT0005 


MMQ010 


A 


79 




CTT0010 


MM0064 


A 


46 




CTT0011 


MM0065 


A 


201 




CTT0017 


MM0103 


A 


48 




CTT0020 


MM0154 


A 


68 





CTT0005 

See Appendix F 

CTT0010 

See Appendix F 

CTT0011 

See Appendix F 

CTT0017 

See Appendix F 

CTT0020 

CATGCTAATG GAGTTTATTC TTAGGACTGC CTCCTGCATC CATTGATTGA CTTAAATATG 
TGCACACT 
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